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1 Introduction 

The statistical analysis of covariance matrices occurs in many important applications, e.g. in diffusion 
tensor imaging and longitudinal data analysis. We consider the situation where it is of interest to 
estimate an average covariance matrix, describe its anisotropy, to carry out principal geodesic analysis 
and to interpolate between covariance matrices. 

An important difference with standard statistical techniques is that non-Euclidean distances are 
most natural for comparing covariance matrices, which are symmetric, semi-positive definite matrices. 



2 Diffusion tensors 

In medical image analysis a particular type of covariance matrix arises in diffusion weighted imaging 
called a diffusion tensor. The diffusion tensor is a 3 x 3 covariance matrix which is estimated at each 
voxel in the brain, and is obtained by fitting a physically-motivated model on measurements from the 
Fourier transform of the molecule displacement density (Basser et al., 1994). 

In the diffusion tensor model the water molecules at a voxel diffuse according to a multivariate 
normal model centred on the voxel and with covariance matrix S. The displacement of a water 
molecule x 6 R 3 has probability density function 

= ( 27 r)3/2| S |i/2 exp(-Vs-M- 

The convention is to call D = S/2 the diffusion tensor, which is a symmetric positive semi-definite 
matrix. The diffusion tensor is estimated at each voxel in the image from the available MR images. 
The MR scanner has a set of magnetic field gradients applied at directions g\,g%,..., g m 6 RP 2 with 
scanner gradient parameter b, where RP 2 is the real projective space of axial directions (with gj = —gj, 



\\gj\\ = 1). The data at a voxel consist of signals (Zq, Z\, . . . , Z m ) which are related to the Fourier 
transform of the displacement density in axial direction gj £ RP 2 , j = 1, . . . ,m, and the reading 
Zq is obtained with no gradient (b = 0). The Fourier transform in axial direction g £ i?P 2 of the 
multivariate Gaussian displacement density is given by 

F{9) = j exp(iVbg T x)f(x)dx = exp(-bg T Dg), 

and the theoretical model for the signals is 

Zj = Z J 7 (g j ) = Z ex.p(-bgjDgj), j = 1, . . . , m. 

There is a variety of methods available for estimating D from the data (Zq, Z\,..., Z m ) at each voxel 
(see Alexander, 2005), including least squares regression and Bayesian estimation (e.g. Zhou et al., 
2008). Noise models include log-Gaussian, Gaussian and more recently Rician noise (e.g. Fillard et 
al., 2007). A common method for visualizing a diffusion tensor is an ellipsoid with principal axes given 
by the eigenvectors of D, and lengths of axes proportional to y/Xl, i = 1,2,3. 

If a sample of diffusion tensors is available we may wish to estimate an average diffusion tensor 
matrix, investigate the structure of variability in diffusion tensors or interpolate at higher spatial 
resolution between two or more estimated diffusion tensor matrices. 

A strongly anisotropic diffusion tensor indicates a strong direction of white matter fibre tracts, 
and plots of measures of anisotropy are very useful to neurologists. A measure that is very commonly 
used in diffusion tensor imaging is Fractional Anisotropy 

(1) ^={l^ T E(A l -A) 2 /EAf} / , 

where < FA < 1 and Aj are the eigenvalues of the diffusion tensor matrix. Note that FA ~ 1 if 
Ai >> \,i > 1 (very strong principal axis) and FA = for isotropy. In diffusion tensor imaging 
k = 3. 



3 Non-Euclidean statistics 
3.1 The Frechet mean 

When using a non-Euclidean distance d() we must define what is meant by a 'mean covariance matrix'. 
Consider a probability distribution for a k x k covariance matrix S on a Riemannian metric space 
with density f(S). The Frechet (1948) mean S is defined as 

S = arginf ± J d(S,Z) 2 f(S)dS, 

and is also known as the Karcher mean (Karcher, 1977). The Frechet mean need not be unique in 
general, although for many distributions it will be. Provided the distribution is supported only on the 
geodesic ball of radius r, such that the geodesic ball of radius 2r is regular (i.e. supremum of sectional 
curvatures is less than (it/ (2r)) 2 ), then the Frechet mean S is unique (Le, 1995). The support to ensure 
uniqueness can be very large. For example, for Euclidean spaces (with sectional curvature zero), or 
for non-Euclidean spaces with negative sectional curvature, the Frechet mean is always unique. 

If we have a sample S\, . . . , Sn of i.i.d. observations available then the sample Frechet mean is 
calculated by finding 

N 

t = arginf^d(5i,S) 2 . 
E i=i 

Uniqueness of the sample Frechet mean can also be determined from the result of Le (1995). 



3.2 Distances between covariance matrices 



We now consider specific choices of distances in order to provide estimates of a mean from the sample 
of N covariance matrices. To ensure the positive definiteness of the covariance matrices, a reparam- 
eterization can be used such as Si = QiQj, where Qi € R 3x3 . For example, Qi = chol(Si) is the 
Cholesky decomposition, where Qi is lower triangular with positive diagonal elements. Note that Qi 
and any rotation and reflection of it QiRi (Ri £ 0(3)) can result in the same Si, i.e. Si — QiQ^ — 
Q l R i (Q l R l ) T ,i = l,...,N. 

In applications there are several choices of distances between covariance matrices that one could 
consider, for example see Table [TJ 



Name 


Notation 




Form 


Estimator 


Euclidean 


dE(Si,S 2 ) 




\\Si — S 2 \\ 




Log-Euclidean 




||log(5i)-log(5 2 )|| 


Sl 


Riemannian 


dR(Si,S 2 ) 


iiio g (sr 1/2 s 2 sr 1/2 )ii 




Cholesky 


dc(Si,S 2 ) 


||chol(5i)-chol(5 2 )|| 




Root Euclidean 


<Ih(Si,S2) 


IKjl/2 (,1/2,1 
ll'-'l D 2 II 


t H 


Procrustes size-and-shape 


ds(Si,S 2 ) 


mr i?eO(fc) ||chol(5i) - chol(S 2 )-R|| 




Full Procrustes shape 


d F (Si,S 2 ) 


m fReO(fc),/3eR 


isa - ^oi(s 2 )R 




Power Euclidean 


d A (Si,S 2 ) 


1 II oa QCt\\ 
all D l D 2 II 





Table 1: Some distances between covariance matrices and notation for the corresponding Frechet mean 
estimators. 

Estimators £e, £c, ^H, ^L, given in Table[T]are straightforward to compute using arithmetic 
averages. Note that ds is obtained by optimal rotation/reflection of chol(S 2 ) onto chol(Si) using ordi- 
nary Procrustes analysis. The Procrustes based estimators S5, T,p involve the use of the Generalized 
Procrustes Algorithm, which works well in practice (see Dryden et al., 2009). The Riemannian metric 
estimator S# uses a gradient descent algorithm which is guaranteed to converge (e.g. see Pennec et 
al, 2006). In practice it is similar to the log-Euclidean estimator (Arsigny et al., 2007). 

We briefly summarize some of the properties of the distances. All these distances are invariant 
under simultaneous rotation and reflection of Si and S 2 , i.e. the distances are unchanged by replacing 
both Si by VSiV T , V G 0(k),i = 1,2. Metrics d^Q, c?_r(), dp{) are invariant under simultaneous 
scaling of Si,i = 1,2, i.e. replacing both Si by (3Si. Metric d^Q is also affine invariant, i.e. the 
distances are unchanged by replacing both Si by ASiA T , i = 1, 2 where A is a general k x k full rank 
matrix. Metrics c?l(), ^r() have the property that d(A, 1^) = d{A~ l , 7^.), where Ik is the k x k identity 
matrix, and d^Q, duQ, dpQ are not valid for comparing rank deficient covariance matrices. Finally, 
there are problems with extrapolation with metric c?e() : extrapolate too far and the matrices are no 
longer positive semi-definite (Arsigny et al., 2007). 

An alternative anisotropy measure to FA in (pQ) is to use the full Procrustes shape distance to 
isotropy where 

PA = y^dH4,S) = |^ T ^(v / A J -7A) 2 /E^} ' > 

where vA = \ y/Xl . We include the scale factor when defining the Procrustes Anisotropy (PA) , and 
so < PA < 1, with PA = indicating isotropy, and PA ~ 1 indicating a very strong principal axis. 



Another anisotropy measure based on metrics or dji is the geodesic anisotropy 



GA = < J^(log Xi - log A) 2 




where < GA < oo (Arsigny et al., 2007), which has been used in diffusion tensor analysis in medical 
imaging with k = 3. Alternatively one could consider tanh(GA) (Batchelor et al., 2005) which is on 
the scale [0, 1). 

In some applications covariance matrices are close to being deficient in rank. For example when 
FA or PA are equal to 1 then the covariance matrix is of rank 1. The Procrustes metrics can easily 
deal with deficient rank matrices, which is a strong advantage of the approach. 



4 Interpolation methods 

4.1 Weighted Generalised Procrustes Analysis 

Frequently in diffusion tensor imaging it is of interest to interpolate between sets of tensors. The 
weighted Frechet sample mean of Si, Sn at N voxels with a certain distance function d() is defined 
by: 

N 

(2) S = a rgmfY,md(Si,S) 2 , 

i=i 

where the weights Wi are proportional to a function of the Euclidean distance between locations of 
the tensors (voxels), < Wi < 1 and J2iLi w i = !• 

We choose ds for the distance and then Weighted Generalized Procrustes analysis (WGPA) is 
proposed to obtain the weighted mean of Si, Sat. The objective of WGPA under rotation and 
reflection is to minimise a sum of weighted squared Euclidean norms Swgpa which is given by 

N n 

Swgpa{S\, Sjv) = inf y^Wi || QiRi - ^wjQjRj || 2 
R U ...,R N . =1 j=1 

N 

= inf ^2 Wi || (1 - Wi)QiRi - ^WjQjRj || 2 

Ri,—,Rn ■ i .,. 

i=l j^i 

n 1 

(3) = r3i* S o^f 11 Q%Ri " (i^y g WjQjRj f • 

Let Ri,i = 1, N be the estimates of the rotation matrices. Then, the WGPA mean tensor is given 
by 

(4) Swgpa = QwgpaQwgpa> 

N 

where Qwgpa = "WiQiRi- We give Algorithm [1] for estimating R\,i = 1,...,N. Note that the 
i=l 

algorithm is guaranteed to converge to a local minimum as the reduction in S c at each iteration is at 
least zero. 



Algorithm 1 Weighted Generalised Procrustes Method 



1: Initial setting: Qf «— chol(Di), i = l,...,N 
2: Swgpa from previous iteration: S p <— 

N N 

3: Swgpa from current iteration: S c <— J2 w i II Qf ~ E w jQf || 2 

i=i j=i 

4: while jSp — S'cl > tolerance do 
5: for i = 1 to N do 

7: Calculate the rotation matrix i?j which minimises || Qi — Qf Ri || with partial ordinary 

Procrustes analysis 
8: Qf i- QfRi 

9: end for 

10: S p <- S c 

N N 

ii: S c ^J2m II Qf - E u^Qf || 2 

i=l j=l 

12: end while 

N 

13: QvKGPA <~ E ^iQ t ^ 
i=l 

14: return Qwgpa 



4.2 Regularization 

In medical image analysis a noisy tensor field may be available and so we wish to carry out regular- 
ization. For example, consider a grid of tensors Si, . . . , S n at voxels xi,...,x n and we wish to predict 
the tensor at a new site x. We could use the weighted penalized predictor obtained by minimizing, 
with respect to S, 

n 

S^(A) = u»idist(Si, E)0 + Adist(S, 

i=l 

where the weights lOj > 0, w i = 1 are functions of the distance to the new site, A > is a regular- 
ization parameter, and [i is a reference matrix, such as the identity matrix, zero matrix or an overall 
average. For example we could use W{ oc exp{— 7||s — Xi\ || 2 }, i = 1, . . . , n. 

Consider now smoothing across an image at the voxels xi,..., x n , and so we need to minimize, 
with respect to = 1, . . . , n, 

n n n 

E *%dist(Si, + A dist(S„ 

j=l i=l j=l 

and is the weight as a function of the distance between sites i and j. Note (f3,u) = (2,0) gives 
the weighted Frechet mean, if (/3,cu) = (/?, 0) we have a type of M-estimator (Kent, 1992; Dryden and 
Mardia, 1998, p298), if (/3,w) = (1,0) we have the geometric median (Fletcher and Joshi, 2009), if 
(J3,cj) = (2,2) non-Euclidean type of ridge-regression, and if (f3,u)) = (2, 1) a non-Euclidean type of 
LASSO (see Tibshirani, 1996). Note that for the power metric (and Euclidean and square root) the 
space is Euclidean, and so using this procedure is relatively straightforward in this case. 

5 Applications 

5.1 Anisotropy of diffusion tensors 

We consider anisotropy of estimated diffusion tensors in the brain obtained from diffusion weighted 
images (see Dryden et al., 2009). In Figure Q] we see a coronal view of the brain, and the corpus 



callosum and cingulum can be seen. 




Figure 1: The anisotropy measures (top left) FA, (top right) PA, (bottom left) GA and (bottom right) 
tanh{GA) 

At first sight all three anisotropy measures appear broadly similar. However, the PA image 
offers more contrast than the FA image in the highly anisotropic region - the corpus callosum. Also, 
the GA image has rather fewer brighter areas than PA or FA. The plot of tanh(GA) is most different 
from the others, with much fewer dark areas. Due to the improved contrast we believe PA is slightly 
preferable in this example. 

-^/^a H I ^ I I \ 




Figure 2: Principal geodesic analysis for covariance matrices. The true geodesic path is given in the 
penultimate row (black). We then add noise in the three initial rows (red). Then we estimate the mean 
and find the first principal component (yellow), displayed in the bottom row. 



5.2 Principal geodesies of covariance matrices 

We consider now an example estimating the principal geodesies of the covariance matrices Si,...,S n 
using the Procrustes size-and-shape metric ds (see Dryden et al., 2009). Huckeman et al. (2009) 
discuss geodesic principal components analysis in Riemannian manifolds in depth. We consider an 
approximate procedure where the principal geodesies are estimated by principal components analysis 
of the tangent space co-ordinates. In Figure O we consider a true geodesic path (black) and evaluate 
11 equally spaced covariance matrices along this path. We then add i.i.d. Gaussian noise in the 
tangent space for three separate realisations of noisy paths (in red). The overall mean Eg is computed 



based on all the data (n = 33), and then the Procrustes size-and-shape tangent space co-ordinates 
are obtained based on the Cholesky decompositions of the covariance matrices. The first principal 
component loadings are computed and projected back to give an estimated minimal geodesic in the 
covariance matrix space. We plot this path in yellow by displaying 11 covariance matrices along the 
path. It can be seen that the estimated principal geodesic is very similar to the true geodesic path 
here. Other extensions include curve fitting through paths of covariance matrices using polynomials 
and geodesies (e.g. see Evans et al., 2009, for some examples of shape curves). 

5.3 Interpolation 

A tensor field from a healthy human brain has been smoothed and interpolated (with 2 interpolations 
between each pair of original voxels). The Fractional Anisotropy (FA) maps from the processed tensors 
are shown in Figure [3l Obviously, the FA map from the processed tensor data is much smoother than 
the one without processing. The feature that the cingulum is distinct from the corpus callosum is 
clearer in the anisotropy map from the processed data than those without processing in Figure El 




Figure 3: Smoothing and interpolation of the diffusion tensor data from human brain, a: FA map 
from Bayesian tensor field, c: FA map from processed tensor field, b and d: Zoomed inset regions. 
Green arrows: the cingulum. Light blue arrows: the corpus callosum. 

5.4 Tractography 

As a final application we give some initial results of fibre tractographies of the brain stem in a healthy 
human in Figure [H It is of great interest to study the white matter fibre tracts in the brain in order to 
explore connectivity between different parts, both in healthy and patient brains. From different seed 
points in the brain stem, white matter fibres are tracked by following interpolated paths of principal 
directions from diffusion tensors. Tractography from the WGPA processed tensor field is different 
from the other methods, and work is currently underway to assess whether WGPA is preferable. 




a b c 

Figure 4: Fibre tractograhpies using the Bayesian estimates (a), Euclidean smoothing (b) and WGPA 
smoothing (c). Black arrows point out some obvious differences of the WGPA tracts compared with 
other methods. 



6 Conclusions 



Methodology for estimation and inference in the space of covariance matrices has application in many 
areas, including diffusion tensor imaging, structural tensor analysis in computer vision, and modelling 
longitudinal data with Bayesian and random effect models. There are many choices of metric avail- 
able, each with its advantages. The particular choice of what is best will depend on the particular 
application. The use of the Procrustes size-and-shape metric ds is particularly appropriate when the 
covariance matrices are close to being deficient in rank. 
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